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A MST algorithm for source detection in 7-ray images 
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ABSTRACT 

We developed a source detection algorithm based on the Minimal Spanning Tree 
(MST), that is a graph-theoretical method useful for finding clusters in a given set of 
points. This algorithm is applied to 7-ray bidimensional images where the points corre- 
spond to the arrival direction of photons, and the possible sources are associated with 
the regions where they clusterize. Some filters to select these clusters and to reduce 
the spurious detections are introduced. An empirical study of the statistical properties 
of MST on random fields is carried in order to derive some criteria to estimate the 
best filter values. We introduce also two parameters useful to verify the goodness of 
candidate sources. To show how the MST algorithm works in the practice, we present 
an application to an EGRET observation of the Virgo field, at high galactic latitude 
and with a low and rather uniform background, in which several sources are detected. 

Key words: gamma rays: observations - methods: data analysis 



1 INTRODUCTION 



Telescopes for satellite-based high energy 7-ray astronomy 
detect individual photons by means of the electron-positron 
pair that they generate through the detector. From the pair 
trajectories it is possible to reconstruct the original direc- 
tion of the photon with an uncertainty that decreases with 
the energy, from a few degrees below 100 MeV to less than 
a degree above 1 GeV. This technique was applied to the 
past 7-ray observatories SAS-2 (Fichtel et al. 1975), COS-B 
(Bennett 1990) and EGRET-CGRO (Kanbach et al. 1988; 
Thompson et al. 1993), all equipped with spark chambers. 
Pair tracking is also used in the current AGILE mission (Ta- 
vani et al. 2006) and in the LAT telescope on board the next 
GLAST mission, both employing silicon microstrip detec- 
tors (Gehrels et al. 1999). The resulting product is an image 
where each photon is associated with a direction in the sky: 
discrete sources thus correspond to regions in which a num- 
ber of photons higher than those found in the surroundings 
are observed. When the size of this region is consistent with 
the instrumental Point Spread Function the source is con- 
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sidered as point-like, otherwise it can be extended or a group 
of near sources. 

Various algorithms are applied to the detection of point- 
like or extended sources in 7-ray astronomy: the most exten- 
sively used one is based on the Maximum Likelihood (Mat- 
tox et al. 1996), whereas others based on Wavelet Trans- 
form analysis (Damiani et al. 1997), Optimal Filter (Sanz et 
al. 2001), Scale- Adaptive Filter (Herranz et al. 2002), etc., 
were variously applied to real and simulated data to study 
their performances. Some of them are based on deconvolu- 
tion techniques of the instrumental Point Spread Function 
(PSF). Many methods work directly on the pixellated im- 
ages, i.e. count or intensity maps. Other methods search for 
clusters in the arrival directions of photon that, if statisti- 
cally significant, are considered an indication of a source. 

The approach considered by us is essentially a cluster 
search based on a minimal spanning tree (MST) algorithm. 
This technique has its root in graph theory, and highlights 
the topological pattern of connectedness of the detected pho- 
tons. Given a graph G{V, E), where V is the set of vertices 
(or nodes) and E is the set of weighted edges connecting 
them, a MST (Kruskal 1956; Prim 1957; Zahn 1971) is the 
tree (a subgraph of G without closed circuits) that connects 
all the points with the minimum total weight, defined as the 
sum of the weight of each tree's edge. In a data set con- 
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sisting of points in a Cartesian frame of reference, we can 
consider them as the nodes of a graph, the edges being the 
lines joining the nodes, weighted by their length. 

The MST method was originally proposed for 7-ray 
source detection by Di Gesu and Sacco (1983), who investi- 
gated also the statistical properties in uniform fields. This 
work was developed by Di Gesii and Maccarone (1986), and 
De Biase et al. (1986) applied MST for detecting extended 
sources in EXOSAT X-ray images. Other authors applied 
MST methods to the goal of finding galaxy clusters, both in 
2 and 3-dimensional surveys and simulations (Barrow et al. 
1985; Bhavsar & Ling 1988a,b; Plionis et al. 1992; Krzev- 
ina & Saslaw 1996, Doroshkevich et al. 2001, 2004) and 
showed the capabilites of the method as a filament-finding 
algorithm. 

In this paper we investigate the MST approach in 7-ray 
source detection, and present a new study of its statistical 
properties and the definition of selection criteria. We also 
introduce some parameters useful to classify the reliabilty 
of detected clusters to be associated with source candidates. 
We would like to emphasize here that this method is not 
alternative to other source detection algorithms, but it is 
complementary, in the sense that it can give a list of pos- 
sible candidate sources (identified via their photons' clus- 
terization properties) that could be further investigated by 
other means. 

This paper is structured as follows. In Sect. 2 we de- 
scribe our MST algorithm, and in Sect. 3 and 4 we investi- 
gate by means of numerical simulations the statistical dis- 
tributions of edge length and node number, and we intro- 
duce some criteria useful for the source detection with our 
method. An example of application to an EGRET field is 
shown in Sect. 5, while in Sect. 6 we summarize and discuss 
our results. 



2 THE MST ALGORITHM 

The result of an observation performed by a 7-ray telescope 
is a photon list containing for each event the arrival direc- 
tion coordinates, time, energy, and other useful parameters. 
Celestial coordinates (Right Ascension and Declination) of 
every photon define a point in a bi-dimensional frame and 
it can be considered a node in the graph. The edge weight 
A is the angular distance between a couple of nodes. 

The simplest way to find the MST of the field is a ver- 
sion of the Prim algorithm (also known as DJP algorithm; 
Prim 1957): it starts from an arbitrary selected node, finds 
the nearest neighbour and connects them with an edge: this 
is the first edge of the MST. Then it finds the point that 
is the nearest to any point that is already connected in the 
MST. After A'' — 1 iterations, where N is the total number 
of points, the complete MST is found. Faster and compu- 
tationally optimized algorithms can be found using other 
theoretical properties of the MST, like being a subset of 
the Delaunay triangulation of the graph (Delaunay 1934). 
In particular, we used a fast code for the MST computation 
that is freely available from BoosiQ and CGAI0 libraries. 

^ http://www.boost.org 
^ http://www.cgal.org 



Once found the MST, to extract only the locations 
where the photon clusterize, i.e. the possible sources, and 
to evaluate the residual photon background, the following 
operations must be performed: 

• Separation: remove all the edges having A greater than 
a selected separation value Ac. Usually, it is chosen in units 
of the mean edge length Am in the MST. As a result we 
obtain a set of disconnected sub-trees. 

• Elimination: remove all the sub-trees having a number 
of nodes TVn less than or equal to a threshold value A'c- 
This filter is useful to remove small casual clusters of nodes, 
leaving only the clusters that have a high probability to be 
genuine sources. 

After the application of these filters, the remaining sub- 
trees correspond to possible sources. An estimate of the 
source position is obtained by computing the centroid of 
the sub-tree nodes (i.e. the mean value of the Right Ascen- 
sion and Declination between all points in the sub-tree). A 
refined source position can be found by computing the cen- 
troid of all the points lying inside the circle centered on the 
previous calculated sub-tree centroid with a radius equal to 
the distance of the farthest point in the sub-tree, to take 
into account also possible photons belonging to the source 
but accidentally filtered out. 

An example of this procedure is shown in Fig. [T] where 
the upper-left panel shows a frame containing A'tot = 500 
points within a square region of unit length: two clusters 
having different numbers of points have been added to a ran- 
dom generated point distribution. The first one, representa- 
tive of a "strong" source, has 80 points spread on a Gaus- 
sian circle of cr = 0.1, the second one, the "faint" source, 
has 20 points distributed in a similar circle. The random 
"background" has thus 400 points. The upper-right panel 
shows the MST that connects all the points. In the lower- 
left panel are shown the clusters detected after separation 
with Ac = 1.3 Am and elimination with A'c = 7. In this case 
a few small size clusters are detected, which disappear when 
more appropiate filters are used (Ac = Am, A'c = 10, lower- 
right panel), whereas the two genuine sources remain. Their 
positions, computed from the sub-tree centroids, have a dis- 
tance smaller than 0.01 from the right ones, confirming the 
validity of this method to evaluate the source coordinates. 

The two major points of the MST source detection are 
therefore the choice of the two filtering parameters and the 
methods to evaluate the significance of the residual sub- 
trees. 



3 MST STATISTICAL PROPERTIES 

3.1 The length distribution in the MST 

According to Barrow et al. (1985), an useful criterion to 
distinguish a random (Poissonian) field from a field with 
some sources, is the shape of the frequency distribution of 
the edge length in the MST. These authors suggest that for a 
random field this distribution has an approximate Gaussian 
shape peaked around the mean edge length Am. We studied 
this distribution, whose statistical properties are useful to 
choose the best filtering parameters. In Fig. [2] we present the 
frequency distributions of a: = A/ Am computed for a frame 
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Figure 1. Upper left: A set of 500 random-generated points, witli two simulated sources. Upper right: The Minimal Spanning Tree 
between these points. Lower left: Cluster selection after separation with Ac = 1.3 Am and elimination with A^c = 7. Lower right: cluster 
selection with the filters Ac = Am, A^c = 10. The added "sources", at coordinates (0.3, 0.3) and (0.7, 0.7), are marked by the diamond. 
Circles are centered on the centroids of the remaining sub-trees (square) and have a radii equal to the distance of the farthest node in 
the sub-tree. The dot is the refined source position, see text for details. 



with a random field (upper panel) and the same frame with 
five sources added (lower panel) : in the latter case there is a 
clear excess of short distances (within the clusters that mark 
the sources) and of long distances (between the clusters) 
with respect to the random case, and the histogram shows 
an evident left asymmetry. 

A useful indicator for the presence of sources is the mean 
value of the MST length Am- Earlier investigations (Gilbert 
1965) found that the total length of a random MST is pro- 
portional to (ANtot ) where A is the field area and A'tot is 
the total number of points. A theoretical upper limit to the 
proportionality constant was found to be 2'^^^ ~ 0.70. Our 
Monte Carlo simulations showed that the constant value is 
rather ~ 0.65. Therefore the mean length for a random-field 
MST is: 



0.65 X 



A 

iVtot 



(1) 



Thus, if the mean length for a field deviates from this value, 
it is an indicator of non-random clusterization, i.e. of the 
presence of sources. 

Another test for the occurrence of sources is the evalu- 
ation of the skewness coefficient 13^ of the distribution f{x). 
In the two cases of Fig. [2] we found /^s equal to 0.16 and 
0.46; the higher value is due to the decrease of the mean 
length Am and to the occurrence of x values greater than 
~ 2.5 when sources are present. From our simulations we 



found that /^s higher than 0.2 can be considered a good 
indicator for the presence of sources. 

For an accurate study of the edge length distribution it 
is useful to have a simple analytical formula to be applied 
in the computation. Since theoretical works on this subject 
are not easily available in the astronomical literature, we 
followed a numerical approach. 

First we generated a pure random frame containing 10® 
points to smooth the fiuctuations in the histogram and the 
resulting frequency plot is given in Fig. [S] Note that, like 
in Fig. [2l it has a well defined mode, a small skewness and 
very small tail for x > 2. Its shape is not, therefore, that of 
a Gaussian and an approximate formula that gives an excel- 
lent best fit, although properly it is defined in the unlimited 
interval [0, +oo), is a Rayleigh distribution, suppressed at 
large x by a factor similar to that of a Fermi-Dirac (FD) 
distribution: 



fix) = K — exp < — , 

' a2 ^ \ 2a^ j exp (^) + 1 



(2) 



The parameters values were found by means of a nu- 
merical best fit and the resulting formula is: 



1 



/(x) = |.exp|-(£±^^ expf^ — 



+ 1 



(3) 



with a maximum error with respect to the data less than 2%. 
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Figure 2. Upper panel: Histogram of the MST edge length, in 
units of the mean length, for a random field with 1675 points. 
Lower panel: Histogram of the MST edge length, in units of the 
mean length, for the same field in which some strong sources have 
been added. Note that there is a a large left-side asymmetry with 
respect to the random field. 

We computed the values of the mode, the median, the vari- 
ance and other moments from this distribution, and found 
0.892 and 0.952 respectively for the first two, a variance 
equal to 0.208, whereas the skewness and the kurtosis are 
0.080 and 2.439, respectively. 

Another fitting formula can be obtained from Pear- 
son distributions (Smart 1958, chap. 7), again suppressed 
at large x values by a FD factor: 



f{x) = K 



ai 



a2 



(4) 



+ 1 



where K is a normalization factor, a\ is the value of the 
mode, b and a2 are free parameters, c is the cut-off scale. 
Differently from Eq. ([2]), this distribution is defined in the 
finite interval x £ [0, ai -|- 02]. Considering that values of x 
larger than 3.0 are extremely rare, we imposed the condition 
ai + 02 = 3.2 and evaluated the remaining parameters. A 
very good fit was obtained for a\ = 0.91, h = 1.25, c = 1.8, 
d = 0.18 and the normalisation factor K = 0.7676. 

The edge distribution can be useful for the choice of 
the separation parameter Ac. From Eq. Q and Figure [31 we 
can see that the choice of a low Xc = Ac /Am, for instance 
the value of 0.37, implies that about 90% of edges will be 
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Figure 3. Histogram of the MST edge length frequency, in units 
of the moan length, for a random field with 10^ points. Also plot- 
ted is Eq. lO. 



eliminated, and the majority of remaining clusters will have 
a number of nodes too small to satisfy the elimination crite- 
ria. A good choice is to use a value close to unity: we found 
from our simulations that the best range for Xc is between 
0.8 and 1.2, corresponding to the cumulative probabilities 
of 0.384 and 0.683, respectively. In fact, although the prob- 
ability to find an edge smaller than ~ Am is still large, it is 
unlikely that a high number of these edges will belong to a 
single remaining cluster and they are therefore rejected by 
the subsequent filtering. 



3.2 Distribution of the number of sub-trees for a 
given Ac in a random field 

As shown by Di Gesii and Sacco (1983), the expected to- 
tal number of clusters obtained by cutting a random, 2- 
dimensional MST having A'tot points, at an edge length Ac, 
is given by: 



N = l + {Ntot - 1) exp {-TrA^ATtotM} 



(5) 



where Ntot /A is the density of nodes, that according Eq. 
lU is proportional to 1 /Am • This is a monotonic decreasing 
function, and we verified with Monte Carlo simulations the 
consistency of this result. 

We used a different approach, directly based on the cal- 
culated mean edge length and considered another distribu- 
tion, useful for selecting the best A^c parameter, that of the 
number of clusters as a function of the number of nodes af- 
ter the application of a separation at the edge length Ac . We 
computed several distributions in random fields via Monte 
Carlo simulations and found that they can be well described 
by an exponential function: 



T(iV„) = F(Xc) ■ iVto 



(6) 



where T{Nn) is the total number of sub-trees having A'^n 
nodes each and Xc = Ac/Am- Some examples, corresponding 
to different choices of the cut length Ac, are shown in Fig. 
13] We see how the mean number of big clusters decreases 
when the cut length becomes smaller than the mean MST 
edge length: that is explained by the fact that separating at 
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Figure 4. Average number of sub-trees obtained separating a 
1000 points random field at 0.8, 1.0 and 1.2 times the mean MST 
length. 
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Figure 5. Frequency distribution of the clustering degree for the 
residual clusters, for A^c = 12 (white), 16 (green/light dark) and 
20 (red/dark). 



smaller lengths (thus removing more edges from the MST) 
we tend to "fragmentate" the tree in more small pieces. 

Considering the MST of a field with a total number of 
points Ntot and applying a separation at Ac, we can estab- 
lish an useful lower-limit for the elimination value Nc by 
comparing it with a random field with the same number of 
points and separated at the same length. A simple criterion 
is to choose the A'c value for which, in the corresponding 
random field, on average there is only one sub-tree with the 
same node number: 

K(Ac ) 

Another possibility is to use the value N* for which the ex- 
pected number of residual casual clusters is less than unity, 
which is obtained by integrating the distribution of Eq.(5); 

r r(7v„)div.r ^i^n::^ - ^-^^^^ (s) 

J N* K,(Ac) 

Note that A^* is slightly greater than the value given by 
Eq. ((TJ. Of course, the choice A'^c > N* would give a higher 
confidence on the source detection, but the risk of eliminat- 
ing true faint sources increases. 

The two functions F(Xc) and n{Xc) are characterized 
by a monotonic decreasing behaviour and are well described 
by the following power laws: 

F(Xc) ^0.2X-^ '"^ (9) 

and 

k{Xc) ^0.5X-^-^^ (10) 

In a random field we have F{Xc) = 0.461, 0.200, 0.101 and 
«:(Xc) = 0.77, 0.50, 0.35 for Xc = 0.8, 1.0, 1.2, respectively. 

4 CLUSTERING PARAMETER AND 
DETECTION STABILITY 

4.1 Clustering parameter 

Once a list of candidate sources is found, it is useful to intro- 
duce some criteria to select, among the sub-trees remaining 



after the application of the filters, those corresponding to 
the best candidate sources and to reject clusters with a high 
chance to be randomly originated. 

A first parameter is the "clustering degree" that we de- 
fine as g — Am/Am, tree, i.e. the ratio between the mean edge 
length in the whole MST to the one of the edges in the 
sub-tree. The more clusterized is a sub-tree (then, the more 
likely is the candidate source to be true) , the less will be its 
mean edge length Am, tree and the bigger will be the value of 
the clustering degree g. 

We tested how g works investigating its distribution for 
clusters in a random field. We generated 1000 fields of 1000 
points each and evaluated g for the remaining clusters, after 
a separation at Ac = Am and elimination at A'c = A'^ ~ 12. 
In Fig. [5] we show the histogram of the resulting distribu- 
tion of g for the residual clusters, about one for each field as 
expected. It has an approximate Gaussian shape, although 
with asymmetric tails. The maximum is around g — 1.5 
and the skewness has the low value of ~ 0.2. An accept- 
able fit can be obtained by the same Pearson distribution 
used in Eq. (Q, without the exponential suppression factor. 
For comparison, the distributions of g in which the elimi- 
nation value is raised to A^c ~ 16 and 20 are also shown. 
It's evident that the mean clustering degree of the residual 
clusters is still around g = 1.5, but the frequency of these 
clusters is much lower. We can conclude that, in a "true" 
field with the same number of points and separated at the 
same length, clusters with g > 1.7 combined with a number 
of nodes sufficiently higher than N* , are good candidates to 
be genuine sources. For example, in our simulations, cutting 
respectively at Ac — 12, 16 and 20 we have frequencies of 
random clusters with g > 1.7 of about 20%, 4% and 0.5%, 
respectively. A higher threshold value of g would result in 
a safer rejection of spurious clusters, but in this case it is 
also possible to eliminate real weak sources. A good choice 
can be reached by comparing the values of g between the 
remaining clusters. 

For the two clusters of Fig. 1 (lower-right panel) we 
have g — 2.21 and g = 1.96 for the "strong" and the "faint" 
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source, respectively, while lowering the A^c value below A*'* = 
10 spurious clusters also with g > 1.7 appear. 

4.2 Bootstrap method and detection stability 

We can define another parameter to take into account that 
the position of individual events in 7-ray images does not 
coincide with the true incoming direction of photons, be- 
cause the typical uncertainty due to the reconstruction of 
pair trajectories is of the order of a few degrees. A 7-ray im- 
age, therefore, must be considered as a possible realisation 
of a large set of images of the "true" field in the sky. To take 
into account this effect and to verify if the detected clus- 
ters were produced by casual aggregation of events or can 
be considered associated with real sources, we introduced 
a "bootstrap" technique that can be used to improve the 
confidence on detections. Starting from the original image, 
we produce a set of other possible images by generating an 
equal number of photons whose coordinates are randomly 
extracted with a probability density function approximating 
the instrumental PSF, and including the energy dependence. 
We then apply the MST algorithm to these bootstrapped 
fields, with the same filter selection as in the original one, 
and as output we obtain new lists of candidate sources to 
be compared with the original detections. Those having po- 
sitions within the PSF size are assumed to correspond to 
the same original source. Candidate sources having a high 
detection frequency in the bootstrapped images correspond 
to rich and dense clusters and have a high probability to 
be real, whereas sources characterised by a small number of 
nodes and a low clustering degree g are generally detected 
with a low frequency. The "detection stability" parameter s 
is then given by the ratio of the number of detections inside 
a source circle to the total number of bootstrapped fields. 
Sometimes a single cluster is divided into a couple of smaller 
clusters inside the source circle. In this cases, smaller sources 
are counted as a single detection to avoid that s can result 
higher than unity. From our simulations we found that one 
can consider a source detection as reliable if the correspond- 
ing cluster is detected in at least one half of the bootstrapped 
fields, i.e. with s ^ 0.5. 

An example is given in Fig. [S] where we show a boot- 
strap of the Fig. 1 field. In the upper left panel the original 
field is shown, while the upper right panel is a bootstrapped 
field computed using a probability density function equal 
to the one used to generate the simulated sources, i.e. a 
Gaussian with cr = 0.1. Note that the strong source is more 
or less unaffected by the redistribution of photons, whereas 
some other small clusters appear, but they are rejected by 
further filtering. In the lower panels the clusters remaining 
after the MST apphcation (left: Ac = 1.3 Am, = 7, right: 
Ac = Am, A^c = 10) onto this particular bootstrapped field: 
note the different shape and number of clusters with respect 
to the corresponding panels in Fig. 1. With the generation 
of 100 bootstrap fields and the first selection of filters, we 
obtain a detection stability of s = 1 for both sources. If 
we choose the second set of filters the detection stability is 
s = 1 and s = 0.55 for the "strong" and the "faint" source, 
respectively. 

The bootstrap method can also be used when the MST 
algorithm detect two very close clusters, with a separation 
between the centroids less than the PSF size, an effect likely 



due to the presence of an edge just above the cutting thresh- 
old. In this case a large fraction of the bootstrapped fields 
has a single cluster at the position of these two sub-trees 
and we can conclude that the splitting into two clusters was 
accidental and that they correspond to a unique source. 

The statistical distribution of the expected values of 
s in a random field cannot be computed because it de- 
pends upon the instrumental response functions used when 
bootstrapped coordinates are generated. An estimate of the 
threshold s value to reject unstable clusters must be then 
obtained from a comparison of the resulting values. We also 
noticed in our simulations that the source position com- 
puted averaging the centroids of the bootstrap replicas is 
frequently, although not always, closer to the actual source 
location than that derived from the MST application to the 
original field. Eventually, this method can be also used to 
refine the source coordinates. 



5 APPLICATION OF MST TO EGRET FIELDS 

To test the source detection capability of our implementa- 
tion of the MST, we applied it to a real 7-ray image in which 
several sources were already found. Due to the simplicity of 
our algorithm (for example, we don't treat the energy de- 
pendence of the point-spread function and the geometrical 
distortions of photon distribution in a flat projection), we 
choose an high galactic latitude field in order to have a low, 
uniform background, with no strong intensity gradient. Mo- 
rover, this field lies around the celestial equator and pro- 
jection effects on photon coordinates are negligible, being 
smaller than a few percent. 

Fig. [7] shows the central portion of the EGRET Cycle 
1 VP- 11.0 field for photon energies higher than 100 MeV^, 
observed between 03 and 17 October 1991 and comprising 
the quasars 3C 273, 3C 279 and other sources. In the left 
panel, blue squares mark the sources detected in this specific 
pointing and reported in the Third Egret Catalog (3EG, 
Hartman et al. 1999), while the red squares are other 3EG 
sources in field but not detected in this pointing (i.e. only 
upper limits on the fiux are given in the catalog). 

A first application of MST, using the filtering parametrs 
Ac — 0.9 Am and N* = 12, gave the detection of 10 clusters, 
shown as black circles in the same figure. We then used the 
bootstrap method (see Sect. 4) sorting new photon direc- 
tions with a Gaussian distribution centered at each origi- 
nal point and having a cr = 2°. Note that a) the choice of 
a Gaussian distribution for the bootstrapped photons is a 
simplified and energy- averaged approximation of the instru- 
mental PSF, and 6) the use of a circle for the computation 
of s is only a zeroth-order approximation of the actual pho- 
ton distribution, that for real astronomical data would be 
rather an ellipse, due to geometrical projection effects. Over 
100 bootstrap fields were so produced, and only the seven 
candidate sources with a clustering degree g > 1.70 and 
with a detection stability s > 0.5 were retained. The MST 
detected clusters satisfying these criteria are given in Fig. 

^ A standard energy-dependent cut on zenith angle has been ap- 
plied to the original photon list, in order to remove Earth albedo 
7-ray background (Esposito et al. 1997). 
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Figure 6. Upper left: The 500-point field of Fig. 1, witli two simulated sources. Upper right: A bootstrap realisation of the same field, 
with (T = 0.1. Lower left: Cluster selection after the applications of the filters Ac = 1.3 Am and A^c = 7 onto the bootstrapped field. 
Lower right: Cluster selection after Ac = Am and A'^c = 10 onto the bootstrapped field. The meaning of symbols is the same as in Fig. 1. 
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Figure 7. Left: EGRET- VPll.O pointing, 30°x30° square field, centered on RA 188.38, DEC -1-1.33. Blue/dark squares are the 3EG 
sources detected in this pointing, red/light dark squares the other 3EG sources within this area. Black circles are the MST-detected 
candidate sources. Right: The black diamonds are the positions of candidate sources calculated via the bootstrap method. Only the 
sources with a clustering degree g > 1.70 and with a bootstrap detection stability s > 0.5 are retained. The cross here mark the mean 
coordinates of the two clusters that likely belong to the same source, see text for discussion. 



8 R. Campana et al. 



[7] (right panel) and Table 1, where we report the coordi- 
nates, number of nodes, the clustering degree g, bootstrap 
detection stability s, the 3EG counterpart and the possible 
identification based on the new catalogue of blazars Roma- 
BZCat (Massaro et al. 2005). 

For this pointing, the 3EG catalogue reports five 
sources, while two other sources are detected in other point- 
ings of the same field. Five of these seven 3EG sources were 
also detected by the MST method and their angular distance 
from the catalog positions is ~ 1° or less. Four correspond 
to the 3EG sources detected in this pointing, while the fifth, 
3EG J1310-0517, which is reported as an unidentified and 
possibly confused source, is not detected (although there is a 
12-node cluster that correspond to this source after the sep- 
aration, thus just below the elimination value). We found 
also some additional clusters. Two of them (RA=189.52°, 
5=5.78°; RA=190.40°, 5=4.92°) have the smaU separation 
of ~1° and he close 3EG J1236-I-0457, which was not de- 
tected in this pointing by Hartman et al. (1999). The dis- 
tance between their centroids is less than the PSF radius, so 
it is very likely that they belong to a single source: the sub- 
tree of this source was split into two sub-trees by removing 
one single edge having a length slight exceeding Ac. This 
is confirmed by the fact that in about one half of the boot- 
strapped fields there is a single cluster near this position. For 
this reason we can consider the two clusters as belonging to 
a unique source located approximately at the mean position 
(RA=189.96°, (5=5.35°). This source is likely associated with 
the z = 1.762 flat spectrum radio quasar BZQ J1239-I-0443. 
We investigated whether in other EGRET pointings con- 
taining the same region of the sky this source is present, 
and detected it in VP-408.0 and VP-306.0. In the former 
pointing the source was also detected by Ifartman et al. 
(1999), but not in the latter. Although, we didn't found it 
in VP-407.0 where it was reported by Hartman et al. (1999). 
This discrepancy is to be attributed mainly to the faintness 
of these sources, which make the detection extremely sen- 
sitive to the actual source detection method used and its 
threshold values. 

MST algorithm detected a cluster at the coordinates 
(RA=193.41°, 5=-2.47°), which is not in the 3EG catalogue. 
In particular, this cluster has MST parameters comparable 
to those of 3C 273 and there is no reason to reject it. We 
searched without success in the Roma-BZCat and in the 
NED database for possible counterparts and therefore it re- 
mains unidentified. Of course, the possibility that it must 
not be considered genuine and originated by random clus- 
tering of events in the field cannot be excluded. 



6 DISCUSSION 

We presented an application of a Minimal Spanning Tree 
algorithm to the problem of source detection in 7-ray im- 
ages. This method does not involves in the computation the 
instrumental response functions and works recognizing the 
regions of the sky where arrival directions of photons clus- 
terize. It has the advantages of a fast calculation but did 
not provide directly estimates of the source flux. We have 
shown that a MST based algorithm is a viable method to de- 
tect 7-ray sources both in simulated images and in real 7-ray 
observations of the EGRET experiment on board Compton- 



GRO. We proposed some tools to optimize the filtering pa- 
rameters and to assess the reliability of source detections, 
like the clustering degree and the bootstrap detection sta- 
bility. These tools are based on a study, although empirical, 
of the statistical properties of the Minimal Spanning Tree 
on random fields. 

The MST application to an EGRET field around the 
two famous 7-ray loud quasars 3C 273 and 3C 279 found 
almost all the 3EG sources already detected in the same 
pointing and confirmed the presence of another source, de- 
tected in a different pointing. We consider this result a good 
indication that MST method is particularly efficient. We 
found also evidence of a new possible source with a signifi- 
cance comparable to that of other well established sources. 
We expect that future experiments with a better sensivity, 
like the LAT instrument on board GLAST, will confirm or 
disprove this finding. 

There are, however, several possible effects that make 
difficult the source detection and require even more atten- 
tion when the MST method is used. These problems can be 
divided into four main categories: i) problems due to the 
presence of strong sources, ii) problems arising from energy 
spectra of the sources different from that of the background; 
moreover, different spectral indices between the sources will 
result in different probabilities to be detected, due to the 
energy dependence of the PSF, iii) problems originated by 
images with a non- homogeneous background, iv) problems 
due to the geometrical distortions from the arriving celestial 
photons in projection onto the 7-ray telescope, that will re- 
sult not necessarily in a circular shape to characterize proper 
cluster selections. At present we have not developed a well 
established strategy to solve these problems and in the fol- 
lowing we will briefly discuss some aspects useful for the 
understanding of results. 

One or more strong sources in the field have various 
possible consequences. A first relevant effect is that they are 
characterized by a high clustering degree and consequently 
reduce the value of Am with respect to the one expected in 
the field if they were absent. A value of Ac very close to Am 
would here be good to detect strong sources but this selec- 
tion criterion could miss other possible sources of lower flux. 
Another effect is the presence of possible "satellites" in the 
surroundings of a strong source, even closer than expected 
from the PSF, originated by cutting an edge whose length 
is just smaller than Ac. For example, the cluster detected in 
the EGRET field (see Sect. 4) with no obvious counterpart, 
is at a distance of about 3.3° from the strong radio quasar 
3C 279, and therefore we cannot exclude that it could be a 
satellite of the latter. Usually, the satellites do not have a 
high frequency in the bootstrap fields. 

The energy distribution of the photons also affects 
the source detection, because the PSF of 7-ray telescopes 
changes with the energy becoming much narrower at high 
energies. This implies that sources with spectra harder than 
the background are better detected in high energy im- 
ages because their clustering degree increases. At variance, 
sources with soft spectra give more disperse clusters and 
cannot be easily found. 

Another class of problems is present when the back- 
ground is markedly non-homogeneous, as in the case where 
the held contain a portion of the galactic disc. In this case, 
using an unique Ac in all the image would correspond to a 
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Table 1. MST-detected clusters in EGRET pointing 110.0, with g > 1.7 and s > 0.5. For each candidate source are reported the 
celestial coordinates (Right Ascension and Declination, in degrees), the number of nodes of the relative cluster, the clustering degree g, 
the bootstrap detection stability s, the Third EGRET Catalog (3EG) counterpart, and the identification with known sources. 



RA 


DEC 




9 


s 


3EG counterpart 


Identification 


191.20 


-5.66 


201 


2.31 


1. 


3EG J1255-0549 


3C 279 


193.41 


-2.47 


21 


1.88 


1. 






192.22 


-7.82 


37 


1.83 


1. 


3EG J1246-0651 


BZB J1243-0613 


190.40 


4.92(*) 


16 


1.79 


1. 


3EG ,11236+0457 


BZQ ,11239+0443 


188.75 


3.14 


23 


1.71 


1. 


3EG ,11229+0210 


3C 273 


189.52 


5.78(*) 


17 


1.71 


1. 


3EG J1236+0457 


BZQ J1239+0443 


186.88 


-2.23 


15 


2.35 


0.92 


3EG J1230-0247 


BZQ J1236+0224 



(*) These two clusters likely correspond to a unique source, located at about (189.96, 5.35), 
as indicated by the fact that their clusters are connected in about half of boostrapped fields. 



long cutting in the dense region and to a short cutting in 
the region of low density with the consequence of missing 
real sources and producing more spurious clusters. 

A general approach to be used for 7-ray source detec- 
tion is that of using several methods, possibly based on dif- 
ferent techniques, and to compare their results. In this way 
it will be possible to reduce the number of spurious detec- 
tions, because of the different criteria and a priori assump- 
tions applied in the source recognition. Accordingly, MST 
method can be used to obtain a quick list of photon cluster- 
ization regions, that could correspond to possible sources, 
to be studied indipendently with other methods. 

There are other clustering algorithms that can be ap- 
plied to 7-ray source detection, like the Voronoi tessellation 
(Icke & van dc Wcygaert 1987, Aurenhammer 1991). In par- 
ticular, this method is based on the construction of its dual 
graph, the Delaunay triangulation, of which MST is a subset. 
We think, therefore, that at least in principle, they would 
provide similar result and that a combined figure of merit 
for source detection should be defined. 

Here we discussed gamma-ray astronomy as a prime 
candidate for the application of MST method, but it could 
be even better applicable to the study of data clusterization 
in ultra-high energy cosmic rays (UHECR) and hemispher- 
ical neutrino experiments, that are characterized to the ab- 
sence of structured background. We think also that it will 
be possible to extend MST to higher dimensional spaces 
introducing time and energy as additional dimensions. Ba- 
sically there are two approaches: i) to search for clusters in 
separate, dimensionally homogeneous subspaces, and then 
to search for the intersection of the detected clusters and 
ii) to define a new metric for the tree edges that combine 
together the various dimensions in a suitable way for the 
MST computation. Preliminary numerical attempts based 
on the second approach, with energy as third coordinate, 
seem to be very promising to identify sources having spec- 
tra different from that of the background. Another possible 
3-dimensional generalization is to take into account also the 
time, thus searching for variable or stable sources. We will 
discuss a possible application of such a generalized MST in 
a subsequent work. 
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